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Abstract 

Sparse coding has been popularly used as an effective data represen- 
tation method in various applications, such as computer vision, medical 
imaging and bioinformatics, etc. However, the conventional sparse coding 
algorithms and its manifold regularized variants (graph sparse coding and 
Laplacian sparse coding) , learn the codebook and codes in a unsupervised 
manner and neglect the class information available in the training set. To 
address this problem, in this paper we propose a novel discriminative 
sparse coding method based on multi-manifold, by learning discrimina- 
tive class-conditional codebooks and sparse codes from both data feature 
space and class labels. First, the entire training set is partitioned into 
multiple manifolds according to the class labels. Then, we formulate the 
sparse coding as a manifold-manifold matching problem and learn class- 
conditional codebooks and codes to maximize the manifold margins of 
different classes. Lastly, we present a data point-manifold matching error 
based strategy to classify the unlabeled data point. Experimental re- 
sults on somatic mutations identification and breast tumors classification 
in ultrasonic images tasks demonstrate the efficacy of the proposed data 
representation-classification approach. 



1 Introduction 

Sparse coding (Sc) [T] has been successfully applied in many pattern recognition 
applications as a part-based data representation method, such as such as face 
recognition [5] , speech recognition 3 , handwritten digits recognition [?] and im- 
age clustering [3], etc. Given a set of data feature vectors organized as an input 
data matrix, Sc aims at finding a basis vectors pool (also known as codebook), 
and selecting as few basis vectors as possible from the codebook to linearly re- 
construct the data feature vectors, meanwhile keeping the reconstruction error 
as small as possible pQ. 
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Due to the " overcomplete" or "sufficient" characteristic of the codebook 
learned by Sc, the locality of the data points to be encoded might be ignored. 
As a result, similar data vectors may be represented as totally different sparse 
codes based on such codebooks, bringing the the instability of the sparse coding 
and hamming the robustness of the sparse coding based pattern recognition 
applications H]. To overcome this disadvantage, Graph regularized Sparse 
Coding (GraphSc) and Laplacian Sparse coding (LSc) have been proposed by 
Zheng et al. [I] and Gao et al. [5J |5] separately. In both these two methods, 
the local geometrical structure of the dataset is explicitly explored by building a 
fc-nearest neighbor graph, and the graph Laplacian is used as a smooth operator 
to preserve the local manifold structure. Thus, the learned sparse coding vary 
smoothly along the geodesies of the data manifold [4j [5j [6] . 

For most pattern recognition tasks, such as somatic mutations identification 
[7], breast tumors classification [8.1, etc., the class labels are available for the 
training set. Using these class labels, more discriminative sparse codes are 
supposed to be learned in a supervised manner. However, the LSc or GraphSc 
are both unsupervised algorithms, thus do not utilize the class labels and ignore 
the discriminative information contained in the labels. Moreover, both GraphSc 
or LSc assume that the data points from different classes define a single general 
manifold in the feature space and seek common codebook and coding strategy 
for all data points so that the nearby points are likely to have similar codes. 
However, as argued by Lu et al. [HI OH], "it is still unknown that whether a 
single manifold could well model the data and guarantee the best recognition 
accuracy" , thus such this assumption is arguably the most suitable. 

To solve the problems mentioned above, we assume that the optimal code- 
books and coding strategy for for each class should be different due to the 
intrinsic differences of different classes, and propose a novel supervised sparse 
coding method by learning discriminative codes from both the data features 
and class labels. We model the data points from each class as a manifold such 
that we can learn optimal codebook and cods for each specific class. First, we 
partition the entire data set into several class-conditional subsets according to 
the labels, and assume that each subset lay on a class-conditional manifold, 
which should be spanned by a independent class-conditional codebook. Instead 
of regularize the codes with a single manifold as in LSc and GraphSc, we apply 
a mult i- manifold framework for sparse coding regularization. A manifold is esti- 
mated for each class. Then, we formulate the spars coding as a class-conditional 
data features reconstruction and manifold- manifold matching problem and learn 
multiple codebooks and codes to maximize the manifold margins of different 
classes. Lastly, we present a data point-manifold matching error based strategy 
to classify the unlabeled data point. Experimental results on breast tumors clas- 
sification in ultrasonic images [SJ and somatic mutations identification [7] tasks 
demonstrate the efficacy of the proposed data representation-classification ap- 
proach. 
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2 Discriminative Sparse Coding on Multi-Manifold 
(DisScMM) 



In this section, we will introduce the newly proposed sparse coding method on 
multi-manifold. 

2.1 Object Function 

Let us denote the training data set as X = {xi} G R D , i = 1, • ■ • ,N, where 
N is the number of data points and D is the dimensionality of feature vectors 
of the data point, and the class labels as y — {yi} G £, i = 1, • ■ • , N, where 
£ = {1, • ■ • , L} is the set of class labels. We first divide the data set X into 
L class-conditional subsets as X\ = {xi\yi = l,Xi G X}, according to the class 
labels. Let X\ be the data set of the Z-th class, represented by a manifold Mi. 
The object function of DisScMM is composed of two terms as follows. 

2.1.1 Sparse Reconstruction Loss Term 

Different from traditional Sc methods, we represent the data points in each class 
with class-conditional codebook, so that they can be better separated when the 
codebook and coding are selected to be different in the low-dimensional code 
spaces. Given a class-conditional data set Xi, let Ui — [un, ■ ■ ■ ,uik] G R D K 
be the its class-conditional codebook matrix, where each uik G M. D represents 
a code word vector in the codebook, and vu 6 M. K be the coefficient vector of 
Xi e Xi, which is the sparse coding of this data point. Each data point Xi € X\ 
can be reconstructed as a sparse linear combination of those code word vectors 
in the codebook as x% — Uivu. A good coding vu together with codebook 
Ui should minimize the reconstruction loss function, and also should keep the 
reconstruction coefficients as sparse as possible, which can be formalized as 



where V\ is the coefficient matrix, each column of Vi is a sparse representation 
for a data point, and ||%||i is a l\ norm function to measure the sparseness of 
vu- 

2.1.2 Large Margin Term 

Given a sample Xi G Xi belonging to Z-th class, two kinds of neighbors in the 
data set X are considered: intra-class neighbors J\f£ ntra and inter-class neigh- 
bors J\[* nter . Intra-class neighbors of Xi are the p nearest data points from the 
same class as Xi, while inter-class neighbors are the the p nearest data points 
from different class from x.- L . Using Gaussian kernel, we first define the class- 
conditional intra-class affinity matrix Wf ntra and the inter-class matrix W l nter 




^2 (\\xi - UiVuW 2 + a|K||i) 



(1) 



s.t. ||u;fc|| 2 < c, k = I, • • • , K. 
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to characterize the similarity between Xi G Xi and it neighbors in J\f™ tra as well 
as that between xi G Xi and J\f™ ter : respectively, 

W intra = | exp{- 1 1 l - ), i/ x, e AT,, and (ay G A^ mt ™ or G A/"™'™) 
iu 1 0, otherwise 

W inter = f esp(- 1 1 ), »/ n G and (ay G A^ mter or i 4 G A^ nier ) 
l,? 1 0, otherwise 

(2) 

From the viewpoint of classification, the intra-class variance should be min- 
imized while the inter-class separability should be maximized in the spares cod- 
ing spaces, so that the class margin can be maximized for sparse coding. To 
this end, the large margin term of sparse coding is formulated as the following 
optimization problem for Z-th class: 

minM(Yi), M{V t ) =\ £ ( £ \\v H - v tj fW H f ra ) 

1 (3) 

~2 H ( E ih>«-%iWs ter ) 

On the one hand, the first term of objective function of M(Vi) in ((3]) is to ensure 
that if Xi and Xj are close and from the same class, then their class-conditional 
sparse codes vu and vij representations are close as well. On the other hand, 
the second term of objective function of M(Vi) in ([3]) it ensures that if X4 and 
Xj are close and from different classes, then their class-conditional sparse codes 
Vu and vij representations are separated as far as possible. 



2.1.3 Object Function of DisScMM 

To construct the object function, we first construct the class-conditional mani- 
fold by including the intra and inter-class neighbors of data points Xi G Xi, as 
Mi = U {{xi} U M\ ntra U Nl nter ). The data points in this manifold of Z-th 

class are organized as a data matrix Xi — [x n ] G M. DxNl ,n = 1, • • ■ , JVj, x n G 
Mi, where Ni = \Mi\ is the number of data points in Mi- The corresponding 
sparse coding coefficient matrix is denoted as Vi — [v n ] G M. KxNl , where each 
column vi n is a sparse representation for a data point x n . Then, with the above 
defined tow object function terms in section l2. 1.11 and section l2.1.2[ we will have 
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the object function of DisScMM by combining them, as 

o(Ui,Vi) =n(u h v t ) + 0M(Vi) 

= \\Xt - UMW 2 + aJ2\\vin\\i 

n=l 

N, . N, 



1 2 /-inter 
Lnm 



+4 E iK-^i 2 w^r-/4 E \\vin-v lm \\ 2 wi 

n,m— 1 n,m— 1 

= \\Xi- UMW 2 + c*E I Kill + /WW'™^) - Tr(^ij"* e ^ T )] 
n=l 

= - UMW 2 +aJ2 IKIll +/3Tr(^L^ T ) 
n=l 

(4) 

where L™* ra = D™* ra - M^ m * ra and LJ"* er = D\ nter - W? nter are the Laplacian 
matrices, D\ ntra and D\ nter are diagonal matrix whose entries are D\^ a = 

jxrintra jointer V^^Vi Tirinter ^ornf J T . r r intra j inter 

m=i ^inm and ^Jrm - 2^ m= i W;„ m separately, Li - L l — L l 
where /3 is the trade-off parameter. 

With the defined object function, we formulate the proposed DisScMM as 
the following optimization problem: 

rain 0(U h Vi) 

u„v, (5) 
s.t. \\uik\\ 2 < c, k = l,--- ,K. 

Note that for each manifold, such a optimization will be performed to learn a 
class-conditional codebook and the codes. 

2.2 Optimization 

The optimal Ui and Vi of ([5]) can be solved by following the iteratively optimiza- 
tion method introduced in GraphSc [I] or LapSc An iterative, two-step 
strategy is adopted to alternately optimize XJ\ and V\. At each iteration, one of 
Ui and Vi is optimized while the other is fixed, and then the roles of Ui and Vi 
are switched. Iterations are repeated until a maximum number of iterations is 
reached. 

2.2.1 On Optimizing Codebooks Ui 

By fixing Vi the optimization problem ([3]) is reduced to 

min \\Xi -UMW 2 

(6) 

s-t. \\ui k \\ < c, fc = 1, ••■ ,K. 
The solution of this problem is introduced in [1] as 

U? = Xtf? (V^ + diag{\*))- 1 (7) 



5 



where A = [A l5 • ■ ■ , A^] T , A& is the Lagrange multiplier associated with the fc-th 
inequality constraint ||w;/c|| 2 < c, and A* is the optimal solution of A. For more 
details, we refer the readers to [UH]. 



2.2.2 On Optimizing Sparse Codes V\ 

By fixing Ui, the optimization problem ([5]) becomes 



nun 
v, 



Ni 



\\Xi - £W|| 2 + ay\\v ln \\ 1 +pTr(V l L l V l T ) 



n=l 



(8) 



Each coding vector w/„ is optimized one by one. To optimize vi n , we fix all 
the remaining sparse codes vi m (m ^ n) . Note that the Laplacian regularizer of 
multi-manifold can be rewritten as Tr(VjL;VJ T ) = J2n'm=i L n mvJ n vi m . Then 
(© is further reduced to 



min | \x r . 

Vln 



Uivi n \\ +a||v/ n ||i +j3 



i v ln v lr, 



2v L 



T V T 



Vlr, 



(9) 



This problem can be optimized by the graph regularized Sparse Codes learning 
introduced in Algorithm 1 of [4] . or the feature-sign search algorithm introduced 
in Algorithm 1 of [6]. Here we adopt the one introduced in [4]. In fact, these to 
algorithms are basically the same except the initialization procedure. Moreover, 
graph regularized Sparse Codes learning introduced in Algorithm 1 of [4] requires 
the graph weight matrix to be symmetric while the other one do not. 

The learning procedure of DisScMM algorithm is summarized in Algorithm 

III 



2.3 Classifier of DisScMM 

Differently from traditional Sc methods which can only be used to represent the 
date, the DisScMM can also makes use of the discriminative nature of sparse 
coding on multi-manifold to perform classification. When a new data point Xt 
comes in, we match it to all the manifolds and then assign it to the class with 
minimum matching error. Assuming xt belongs to Z-th class, we first calculate 
its intra-class nearest neighbors AT{™ tra and inter-class nearest neighbors A/"™ ter 
from Mi- We also suppose that the input of this new data point has no effect 
on the discriminate graphs in the sparse codes of Mi, so the sparse codes v\ n 
for x n G Mi are fixed. Then the match error between xt and Mi is defined as: 

£i(x t ) = min \\x t - Uivit\\ 2 + a\\vit\\i + ^ V" ||u/ t 

13 sr ii 

(10) 



v ln\\ W Hn 



V ln\\ W ltn 
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Algorithm 1 The learning procedure of DisScMM Algorithm. 

INPUT: Training sets Aii, ■ ■ ■ , Ml of L classes of mult i- manifold; 
for I = 1, • • • ,1 do 

Construct discriminate graph weight matrices as in ([2]) and corresponding 
Laplacian matrices Li for Z-th manifold. 

Initialize the class-conditional codebook Uf and sparse codes V t ° for Z-th 
manifold, by performing Sc to M;. 
for t = 1, ••• ,T do 
for n = 1, • ■ • , Ni do 

Update the sparse codes vj n while fixing vj^ , m 7^ n and £7* - by 
solving © for Z-th manifold, 
end for 

Update the codebook Uf while fixing VJ 4 by ([7]) for Z-th manifold, 
end for 
end for 

OUTPUT: The final class-conditional codebooksC/ ; T and sparse codes V^, 
Z = !,•••,£. 



where W^l ra and W{^ er are the intra and inter-similarities of xt to the n-th 
data point of Mi, which is calculated by ((2]). This optimization problem can 
also be solve by Algorithm proposed in [4] . We finally assign a label yt to xt as 
follows: 

y t ^l*= ar ■grain £i(x t ) (-q) 

The classification procedure is summarized in Algorithm [2] 

Algorithm 2 The classification procedure of DisScMM Algorithm. 
INPUT: Training sets Aii, ■ ■ ■ ,Ml of L classes of mult i- manifold; 
INPUT: The class-conditional codebooks Ui and sparse codes V\ for L man- 
ifolds, 1 = 1,--- ,L. 

INPUT: The input unlabeled data point xt. 

for I = 1, • • • , L do 

Extend the discriminate graph weight matrices by adding x t as in ^ and 
compute corresponding Laplacian matrices Li for Z-th manifold. 
Compute the matching error £i(xt) of Xt to Aii as in (jlOl) . 

end for 

Classify xt into the Z*-th class with minimum matching error as in (|Iip . 
OUTPUT: The class label y t of x t . 
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3 Experiments 



In this section, we will evaluate the proposed method on two challenging data 
classification tasks. 

3.1 Experiment I: Identifying Somatic Mutations 

Profiling tumours for single nucleotide variant (SNV) somatic mutations using 
next-generation sequencing technology (NGS) plays an important role in the 
study of cancer genomes [7J. In this experiment, we will evaluation our DisScMM 
on the task of inferring somatic mutations from paired tumour/normal NGS 
data. 

3.1.1 Database and Setup 

Two independent datasets are used to train and test the performance of the 
DisScMM method for somatic mutation identification. 

Training Set The training dataset is selected from the exome capture data 
containing 3369 variants which are predicted by using only allelic counts 
and liberal thresholds [TJ. Further re-sequencing experiments revalidated 
1015 somatic mutations, 471 germline and 1883 wild-type positions. Our 
selected training data set contains 800 somatic mutations, and 1800 non- 
somatic mutations (germline and wildtype). 

Test Set The test dataset is selected from the whole genome shotgun data 
containing 113 somatic mutations, 57 germline mutations and 337 wild- 
types [7J . These positions are deliberately held out of the training data so 
that the test set and the training set are completely independent from each 
other. We select 90 somatic mutations and 300 non-somatic mutations to 
construct the test set. 

Given the i-th candidate mutation site of the genome in the dataset, it is 
represented by a feature vector Xi with 106 feature components constructed from 
both the tumor and normal data as in [TJ. The somatic mutations identifying 
problem is to predict the label yi of the feature represented site, yi is defined 

as 

{1, if i — th site is a somatic mutation, . 
2, if i — th site is a non — somatic mutation. 

To predicate the class labels in the test set, we first learn the codebooks for 
the somatic mutations manifold and non-somatic mutations manifold using the 
training set for DisScMM. For the learning procedure, we applied a 10-fold cross- 
validation analysis to find the optimal hyper-parameters. Then the learned 
DisScMM model will be applied to the independent test set to classify each 
candidate mutation site into somatic mutations or non-somatic mutations. Some 
competing algorithms, including Sc Q], GraphSc [4] and LapSc [6] are also tested 
as mutation representation methods. 
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To evaluate the performances of the classification results, we employ the 
recall, precision 11., accuracy, F-score, matthews correlation coefficient (MCC) 
as metrics. Recall, precision and accuracy are defined as 

TP „ TP . TP + TN 



RecM = TP + JW ' PreClSl ° n = TPTFP' AcCUraCV - TP + FN + TP + FP 

(13) 

where TP, FP, TN, FN are the number of true positives, false positives, false 
positives and false negative respectively. The F-score is the harmonic mean of 
precision defined as 

r-, „ Recall x Precision ,, 

Recall = 2 x — — — (14) 

Recall + Precision 

Recall Precision, accuracy, the F-score are comprised between and 1, and the 
classifier with the larger value has the better the performance. The MCC is 
given by 

, r „ TP xTN - FP x FN 

MCC = - (15) 
^(TP + FN) (TP + FP) (TN + FN) (TN + FP) K ' 

The MCC value is between -1 and 1. A perfect classifier has MCC = 1, a random 
predictor has MCC = 0, while perfect inverted predictor has MCC = -1. 



3.1.2 Results 

The boxplots of recalls, precisions, accuracies, F-scores and MCCs of 10-fold 
cross-validation on the training data set are shown in Fig. [T] (a) - (e), re- 
spectively, where the various performance metric values of our DisScMM show 
accuracy of the returned top results. We observe that for all performances 
measures, DisScMM outperforms the baseline methods significantly in terms of 
both median value and Q values. We also observed that the unsupervised single 
general graph based sparse coding, i.e. GraphSc and Lap, has comparable per- 
formance to the each other. From these figures, it is not very surprising to see 
that original Sc provides the worst performance since the Sc function ignores 
locality of the data points. 

Fig. [U summarizes the recalls, precisions, accuracies, F-scores and MMCs 
for the proposed DisScMM and its competitors on the independent test dataset. 
According to Fig. [2 we first observe a significant difference between recall and 
precision scores for all the methods, which is consistent with the observations 
reported in the previous 10-fold cross-validation on the training dataset. The 
possible reason is the significant unbalanced number of the positive and nega- 
tive samples. Second, we observe that for all the cases, DisScMM outperforms 
GraphSc, LapSc and Sc significantly. Fig. [2] also shows that the sparse cod- 
ing methods with manifold regularization outperform sparse coding without it. 
Our DisScMM based somatic mutations identifying method outperforms both 
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Figure 1: Boxplots of recalls, precisions, accuracies, F-scores and MMCs of 
10-fold cross-validation on training set of somatic mutation identification. 



GraphSc and LapSc based tagging methods, and achieve the best somatic muta- 
tions identifying performance of all the methods, which proves the effectiveness 
of DisScMM for this task. Moreover, GraphSc and LapSc achieves much better 
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Figure 2: The recalls, precisions, accuracies, F-scores and MMCs on the test set 
of somatic mutation identification. 



performance than original Sc, which proves the usefulness of regularizing the 
sparse code with the nearest graphs. 



11 



3.2 Experiment II: Breast Tumor Classification in Ultra- 
sonic Images 

Medical examination based on ultrasound imaging is indispensable for the early 
detection and treatment of breast cancers [5]. Thus, Developing automated 
differential diagnosis system that classifies a given breast tumor as benign or 
malignant plays an important role in modern medical examination. In this 
experiment, we will evaluate the performance in the task of breast tumor clas- 
sification in ultrasonic images. 

3.2.1 Dataset and Setup 

We collects 340 ultrasound images for the evaluation of proposed tumor classifi- 
cation methods. Each of the ultrasound image included a biopsy-proven tumor 
(a carcinoma, a fibroadenoma, or a cyst), where carcinoma is malignant tumor 
while fibroadenoma and cyst are benign tumors. The tumor border is delin- 
eated manually. The data set contains 220 carcinomas, 60 fibroadenomas, and 
60 cysts. 

Given and ultrasound image, we extract 208 features and present them in a 
feature vector x. The 208 features consist the K- related and conventional fea- 
tures, covering all of the diagnostic observations [8]. The classification problem 
is to differentiate three types of lesions (carcinoma, fibroadenoma, and cyst). 
For validation, we conducted a 5-fold cross-validation test. The data set is firstly 
divided randomly into 5 subsets and then 4 subsets were used for training, and 
the remaining 1 subset was used to test the proposed DisScMM, the GraphSc, 
LapSc, and Sc methods. We repeat the cross-validation process 5 times, and 
each of the 5 subsamples used exactly once as validation data. 

3.2.2 Results 

Fig. [3]shows the boxplots of classification accuracies obtained by different meth- 
ods on the ultrasonic breast tumor images dataset. As shown in Fig. [3l our 
DisScMM method can achieve much better results on the 5-fold cross-validation 
protocol than the state-of-the-art sparse coding methods. Specifically, DisScMM 
outperforms almost all of the compared sparse coding methods across different 
tumor classes. There are two possible reasons to explain why our DisScMM 
method is superior to these methods: 

1. our supervised method explores the discriminative information explicitly 
by multi-manifold regularization, while most state-of-the-art sparse cod- 
ing methods are intrinsically unsupervised methods even they can extract 
some discriminative information from the graph model; 

2. our method codes the features in a supervised manner by using class- 
conditional codebook and multi-manifold regularizer while others code 
features in a unsupervised general way. 
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Figure 3: Boxplots of accuracies of different tumors on the ultrasonic breast 
tumor images set. 

4 Conclusion and Future Work 



In this paper we have proposed a novel discriminative sparse coding method to 
address the data representation and classification problem. Multiple manifolds 
are constructed for sub-sets of different classes. The class-conditional sparse 
coding are conducted to maximize the manifold margins of different classes. 
Experimental results on two challenging tasks are presented to demonstrate the 
efficacy of the proposed approach. 

In the future, we are interested in designing multi-multiple regularized non- 
negative matrix factorization (NMF) [12] by exploring the class label to im- 
prove the data representation. Moreover, how to utilizing the coding results 
to further refine the manifolds model appears to be another interesting di- 
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rection of future work. Moreover, DisScMM can also be used to bioinfor- 
matics [T3J Ql HH US EH CEB], medical imaging QH [20l HU [22], biometrics 
[231 HH [2H HH [2H E01 Ell 132 and computer vision [33 OS [351 . 
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